Unsupervised Graph-Based Entity Resolution for Complex Entities

نویسندگان

چکیده

Entity resolution (ER) is the process of linking records that refer to same entity. Traditionally, this compares attribute values calculate similarities and then classifies pairs as referring entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between with improve linkage quality. Most only consider databases containing basic entities have static relationships, such publications in bibliographic databases. In contrast, temporal record addresses problem where can change over time. However, neither existing nor achieve high quality complex , an (such a person) its time while having different other at points article, we propose unsupervised framework aimed entities. Our provides five key contributions. First, propagate positive evidence encountered when use subsequent links by propagating changed. Second, employ negative applying link constraints restrict which candidate for linking. Third, leverage ambiguity disambiguate similar that, however, belong Fourth, adaptively exploit structure relationships. Fifth, using graph measures, refine matched clusters removing likely wrong records. We conduct extensive experiments seven real-world datasets from domains showing average our precision up 25% recall 29% compared several state-of-the-art techniques.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution

Entity resolution identifies all records in a database that refer to the same entity. The mainstream solutions rely on supervised learning or crowd assistance, both requiring labor overhead for data annotation. To avoid human intervention, we propose an unsupervised graph-theoretic fusion framework with two components, namely ITER and CliqueRank. Specifically, ITER constructs a weighted biparti...

متن کامل

Unsupervised Named Entity Resolution

Resolving the ambiguity of person, organisation and location names is a challenging problem in the Natural Language Processing (NLP) area. This problem is usually formulated as a clustering problem, in which the target is to group mentions of the same entity into the same cluster. In this paper, we present a different approach based on the Distributional Hypothesis and edit distance, which asso...

متن کامل

Unsupervised Ranking Model for Entity Coreference Resolution

Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community. In this paper, we propose a generative, unsupervised ranking model for entity coreference resolution by introducing resolution mode variables. Our unsupervised system achieves 58.44% F1 score of the CoNLL metric on the English...

متن کامل

Graph-based Approaches for Organization Entity Resolution in MapReduce

Entity Resolution is the task of identifying which records in a database refer to the same entity. A standard machine learning pipeline for the entity resolution problem consists of three major components: blocking, pairwise linkage, and clustering. The blocking step groups records by shared properties to determine which pairs of records should be examined by the pairwise linker as potential du...

متن کامل

Unsupervised Models of Entity Reference Resolution

Unsupervised Models of Entity Reference Resolution

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data

سال: 2023

ISSN: ['1556-472X', '1556-4681']

DOI: https://doi.org/10.1145/3533016